pp108 : Working With Problem Registry

Working With Problem Registry

This topic explains problem registry.

Service containers can face issues like a lost database connection, that make it impossible for them to handle requests. In such a situation, the service container can enter the so-called problem state, indicating it cannot handle client requests. This is particularly useful in a system that is configured for high availability: if a service container flags a problem, the routing algorithm responds to that and will not deliver requests to the container facing the problem, but distribute the work over the remaining containers that do not face a problem.

The system will periodically call back to the component that registered the problem, to see whether the problem can be resolved now, for instance by trying to connect to the database again. If the problem is resolved, the service container will automatically leave the problem state and thus receive requests again. This is called self-healing, also known as autonomous computing.

This feature is implemented through the Problem Registry, a framework that helps application developers register and unregister problems. This framework is based on State SyncUp, which distributes the registered problems across all service containers. These problems can also be retrieved programmatically.

All problems are classified as CRITICAL, that is, there are no hierarchy or severity levels for problems. When a service container enters or leaves the problem state, it issues an alert (see Understanding Alert System), to which administrators/operators can respond.

An example of working with the Problem Registry is as follows:

  • A service on machine A detects loss of connection with the database. It registers a problem.
  • The web gateway gets a new request for the service. It detects that the service instance on machine A has a problem, so it passes the request to another instance of the service.
  • When the database connection is restored, the service instance on machine A unregisters the reported problem.
  • The web gateway resumes forwarding requests to the service instance on machine A.

This section includes the following topics:

Related tasks

Handling Globalization Aspects in Web Application Development